20 research outputs found

    Improved Finite Blocklength Converses for Slepian-Wolf Coding via Linear Programming

    Full text link
    A new finite blocklength converse for the Slepian- Wolf coding problem is presented which significantly improves on the best known converse for this problem, due to Miyake and Kanaya [2]. To obtain this converse, an extension of the linear programming (LP) based framework for finite blocklength point- to-point coding problems from [3] is employed. However, a direct application of this framework demands a complicated analysis for the Slepian-Wolf problem. An analytically simpler approach is presented wherein LP-based finite blocklength converses for this problem are synthesized from point-to-point lossless source coding problems with perfect side-information at the decoder. New finite blocklength metaconverses for these point-to-point problems are derived by employing the LP-based framework, and the new converse for Slepian-Wolf coding is obtained by an appropriate combination of these converses.Comment: under review with the IEEE Transactions on Information Theor

    Information-Theoretic Bounds on Transfer Generalization Gap Based on Jensen-Shannon Divergence

    Full text link
    In transfer learning, training and testing data sets are drawn from different data distributions. The transfer generalization gap is the difference between the population loss on the target data distribution and the training loss. The training data set generally includes data drawn from both source and target distributions. This work presents novel information-theoretic upper bounds on the average transfer generalization gap that capture (i)(i) the domain shift between the target data distribution PZ′P'_Z and the source distribution PZP_Z through a two-parameter family of generalized (α1,α2)(\alpha_1,\alpha_2)-Jensen-Shannon (JS) divergences; and (ii)(ii) the sensitivity of the transfer learner output WW to each individual sample of the data set ZiZ_i via the mutual information I(W;Zi)I(W;Z_i). For α1∈(0,1)\alpha_1 \in (0,1), the (α1,α2)(\alpha_1,\alpha_2)-JS divergence can be bounded even when the support of PZP_Z is not included in that of PZ′P'_Z. This contrasts the Kullback-Leibler (KL) divergence DKL(PZ∣∣PZ′)D_{KL}(P_Z||P'_Z)-based bounds of Wu et al. [1], which are vacuous under this assumption. Moreover, the obtained bounds hold for unbounded loss functions with bounded cumulant generating functions, unlike the ϕ\phi-divergence based bound of Wu et al. [1]. We also obtain new upper bounds on the average transfer excess risk in terms of the (α1,α2)(\alpha_1,\alpha_2)-JS divergence for empirical weighted risk minimization (EWRM), which minimizes the weighted average training losses over source and target data sets. Finally, we provide a numerical example to illustrate the merits of the introduced bounds.Comment: Submitted for conference publicatio

    Address-Event Variable-Length Compression for Time-Encoded Data

    Full text link
    Time-encoded signals, such as social network update logs and spiking traces in neuromorphic processors, are defined by multiple traces carrying information in the timing of events, or spikes. When time-encoded data is processed at a remote site with respect to the location it is produced, the occurrence of events needs to be encoded and transmitted in a timely fashion. The standard Address-Event Representation (AER) protocol for neuromorphic chips encodes the indices of the "spiking" traces in the payload of a packet produced at the same time the events are recorded, hence implicitly encoding the events' timing in the timing of the packet. This paper investigates the potential bandwidth saving that can be obtained by carrying out variable-length compression of packets' payloads. Compression leverages both intra-trace and inter-trace correlations over time that are typical in applications such as social networks or neuromorphic computing. The approach is based on discrete-time Hawkes processes and entropy coding with conditional codebooks. Results from an experiment based on a real-world retweet dataset are also provided.Comment: submitte

    Information-Theoretic Generalization Bounds for Meta-Learning and Applications

    Full text link
    Meta-learning, or "learning to learn", refers to techniques that infer an inductive bias from data corresponding to multiple related tasks with the goal of improving the sample efficiency for new, previously unobserved, tasks. A key performance measure for meta-learning is the meta-generalization gap, that is, the difference between the average loss measured on the meta-training data and on a new, randomly selected task. This paper presents novel information-theoretic upper bounds on the meta-generalization gap. Two broad classes of meta-learning algorithms are considered that uses either separate within-task training and test sets, like MAML, or joint within-task training and test sets, like Reptile. Extending the existing work for conventional learning, an upper bound on the meta-generalization gap is derived for the former class that depends on the mutual information (MI) between the output of the meta-learning algorithm and its input meta-training data. For the latter, the derived bound includes an additional MI between the output of the per-task learning procedure and corresponding data set to capture within-task uncertainty. Tighter bounds are then developed, under given technical conditions, for the two classes via novel Individual Task MI (ITMI) bounds. Applications of the derived bounds are finally discussed, including a broad class of noisy iterative algorithms for meta-learning.Comment: Accepted to Entrop

    Transfer Bayesian Meta-learning via Weighted Free Energy Minimization

    Full text link
    Meta-learning optimizes the hyperparameters of a training procedure, such as its initialization, kernel, or learning rate, based on data sampled from a number of auxiliary tasks. A key underlying assumption is that the auxiliary tasks, known as meta-training tasks, share the same generating distribution as the tasks to be encountered at deployment time, known as meta-test tasks. This may, however, not be the case when the test environment differ from the meta-training conditions. To address shifts in task generating distribution between meta-training and meta-testing phases, this paper introduces weighted free energy minimization (WFEM) for transfer meta-learning. We instantiate the proposed approach for non-parametric Bayesian regression and classification via Gaussian Processes (GPs). The method is validated on a toy sinusoidal regression problem, as well as on classification using miniImagenet and CUB data sets, through comparison with standard meta-learning of GP priors as implemented by PACOH.Comment: 9 pages, 5 figures, Accepted to IEEE International Workshop on Machine Learning for Signal Processing 202

    An Information-Theoretic Analysis of the Impact of Task Similarity on Meta-Learning

    No full text
    Meta-learning aims at optimizing the hyperparameters of a model class or training algorithm from the observation of data from a number of related tasks. Following the setting of Baxter [1], the tasks are assumed to belong to the same task environment, which is defined by a distribution over the space of tasks and by per-task data distributions. The statistical properties of the task environment thus dictate the similarity of the tasks. The goal of the meta-learner is to ensure that the hyperparameters obtain a small loss when applied for training of a new task sampled from the task environment. The difference between the resulting average loss, known as meta-population loss, and the corresponding empirical loss measured on the available data from related tasks, known as meta-generalization gap, is a measure of the generalization capability of the meta-learner. In this paper, we present novel information-theoretic bounds on the average absolute value of the meta-generalization gap. Unlike prior work [2], our bounds explicitly capture the impact of task relatedness, the number of tasks, and the number of data samples per task on the meta-generalization gap. Task similarity is gauged via the Kullback-Leibler (KL) and Jensen-Shannon (JS) divergences. We illustrate the proposed bounds on the example of ridge regression with meta-learned bias.Comment: Submitted for Conference Publicatio
    corecore